MUSIC RESEARCH ON COMPUTERS 


Why And How To Make Computers Sing 


Kirit S. Parikh 


I believe that it may be possible to make computers sing or play Indian 
Classical Music and that computers may be made to do this in the style and 
the voice or instrument of any given musician. Furthermore, I also believe 
that we should make computers do this, for in the process of making com- 
puters sing, we will gain enormous insights into many aspects of music. 
We will know precisely and quantitatively what characterizes various forms 
of voices and tones produced by musicians. We will know what is the 
Structure of a raga, as created by a particular artist, and we may even be 
able to measure quantitatively the similarities and differences between the 
Styles of two artists. We may gain insight into the characteristics of the 
style of a particular musician and may learn, what makes his music his own. 
The increased quantitative understanding of the structure of melody may 
give us a way to identify the common themes that recur in ethnic music 
around the country and may be even around the world. 


I will confine myself to vocal music simply because it is the more 
complicated to produce on a computer. Nevertheless, whatever I say 1s 
equally applicable to instrumental music. 


Singing of a melody phrase consists of two activities. First, one has 
to compose the phrase and then one has to render it. In reality, these two 
activities are taking place simultaneously, but for our purpose we can look 
at them one after the other. 


I will concentrate mainly on the first activity — namely — composing. 
As I have much more to say about composing, let us first treat the rendering 
aspect of music. 


How does a computer produce sound? To understand this think 
how a microphone works. A microphone converts sound into an electrical 
signal. At any instant the intensity of current characterizes the signa 
The variations in this signal from instant to instant, over some time interval 
completely specifies the sound. To make a computer sing, all one has to 


107 COMMUNICATION THEORY 


do is to generate this signal on a computer. This signal can be converted 


into the sound it represents by feeding it into a speaker, or it can be recorded | 


on magnetic tape for later play-back. 


The question now is, can we generate synthetically, mathematically 
this signal? Nearly four years ago in the U.S. the sound of the trumpet 
was synthetically generated with such fidelity that even professional musicians 
were not able to dis.inguish the synthetic sound from the sound of a real 
trumpet. What has been done for the trumpet should be possible for other 
instruments including the human voice. (And I will play here a tape, 
recorded synthetically on a computer. This recording was done at the Tata 
Institute of Fundamental Research in Bombay by Dr. Ramasubramanian, 
and it is with his kind courtesy that we are able to listen to it.) 


What we just heard was of very poor quality. Yet it is sufficient to 
prove the feasibility of synthetic vocal music on a computer. With better 
laboratory facility to analyse the harmonics and other characteristics of 
notes sung straight or andolit (swing), I have no doubt that the quality of 
synthetic music could be made satisfactory. And we can also introduce 
gamak or meend. The melody that you heard was specified note by note to 
the computer. And this brings us to the second question: Can we make a 
computer compose music? The answer to this is ‘yes’, and I will indicate 
how this can be done. 

For purposes of illustration let us look at the alap-s and tana-s for 
Raga Malkauns from Vinayakrao Patwardhan’s Raga-Vigyan — Part III. 
We construct from there a conditional frequency distribution or what can 
also be called transition Matrix. Table 1 shows this.* The first row indi- 
cates the number of times a musical phrase began with, the different notes. 
The subsequent rows specify the frequencies of the various notes following 
any given note. With the help of this matrix we could generate musical 
phrases by drawing random samples in what is known as a Monte-Carlo: 


simultion. : 
In Figure 1, we have I1stacks of cards. Each stack bears a label which 
corresponds to the 11 rows of Table 1. For example, stack | bears the label 
“Begin a Phrase”. Stack 2 has the label D and so on. The number of 
cards in each stack correspond to the number of total entries in that row in’ 
Table 1. For example, stack | has 37 cards, stack 2 has 4 cards, stack 3. 
has 24 cards and so on. 37 car : 
cards with N, 6 with S, 5 with G etc. as per the first row in Table 1. A 


ds of stack 1 consits of 2 cards with Pp. 6. 
Hi the: , 


other stacks also contain notes corresponding to the particular row in | 


Table 1. 


To generate a phrase, 
that acard. Suppose that the ca 
rhea aa a eaeae apt in mI free ency distribution of the notes 

ich | the conditional frequ i } 
Rte ees ace : draw from this a card which 


after shuffling, we 
pe on ea Id then once again draw from the same stack a 


turns out to be D, we shou 
card. “This ge get S. Now we must draw from the stack that bears 
label S, namely stack 9, and we draw a G. We continue this procedure till 
we have found a satisfactory termination of the phrase. Thus we generate 


the following phrase: 


we first shuffle the cards in stack 1 and draw from 
rd bears a D, we write this down and replace 


ppsGSNMDNS 


* For other transitional matrices c-f. B.C. Deva, 
p- 169 ff. 


should now draw a card from’. 


Psychoacoustics of Music and Speech, | 


TACK NO 


ABEL 


>». OF CARDS 


MNVIVN IFAONVS 








BEGIN A 


PHRASE 


37 








24 27 53 60 ST. 65 Al 2| & 


801 


109 COMMUNICATION THEORY 


Table 1 


Conditional Frequency of Next Note 
(Malkeuns - from Sangit-Vigyan - by V.N. Patwardhan) 


Next Rote 





_ Table 2 shows some phrases generated in this manner using the distri- 
bution of Table 1. In a similar manner we can also generate the time 
Structure of the melody and fal can also be incorporated. 


Table 2, 


SANGEET NATAK 110 


Now the question is, ‘Is Table 1 a fair description of the style of Mal- 
kauns of Raga-Vigyan? Or should we use a more complex analysis of the 
structure? I do not know the anser to this question. One can only arrive 
at an answer after actual experimentaton. An example of a Slightly more 
complex framework is shown in Table 3. Here we use besides individual 
notes, some phrases also. Musical phrases generated from this matrix are 


shown in Table 4. 


Table 4 
oN 
1.8 DNDMGGMDNDMGGGS 


3.ND MGMGS 


4.NSGCHDNS 


5. 


When do we consider that a particular structural frame-work is a 
satisfactory one? One test would be to generate lots of synthetic music 
and listen to it to see if it sounds right. Such experimentation requires 
large amount of computational work which could not possibly be done 
without the aid of computers. 


IQ 


MDDDMMNS 
= =e aw br ad 


But before one does all this work, one mist ask, ‘what is it that one 
hopes to learn from such quantitative descriptions of a raga? 


(1) We can analyse a performance of a particular musician and 
construct a transition matrix. We can repeat this for different performances 
of the same raga by the same musician. These matrices from different 
performances would be different from each other. From these matrices 
we can construct a mean or an average matrix and also a variance matrix 
which represents variations from one performance to another. We can 
do this for different artists, and then try to see the similarities and differences 
between different artists. In particular, we can measure what is known as 
the Mahalanobis Distance between any two musicians and check if artists 
belonging to the same gharana are closer to each other than artists from 
different gharana-s. We can also compare a great artist with a not so great 
artist and see if we can identify the qualities of greatness.* 


(2) On the other hand we could construct the transition matrices 
for different raga-s of the same musician and try to see if there is any con: 


Table 3: Conditional Frequency of Next Phrase 


NEXT PHRASE 
























~[e[sfalel-|=[s}9["|* 
“IN 
ASR ERA SRE 
Ste adsl olie 
FEE ECSCCEC slat Ciel - 

3 CCCCCAEP EEE Ere} 
i PT T-i-[ 7 [efelet TT [-[ det fet |-le 
beh eel aoe ee een 1 a 
PaO PCC esis fo 
CCCs CEE ey 
pani | TT TT te tT tet foto 1 ted 
eal] TET Tet debe) a] | 
aff CPC er set ert fe 
SER OR RRR RR ERE ORE RES 
RIECEC EEE ECE Ce Ee at 
wel OCTET ERLE TTT Cisse! 
ou i li EO a 
Paw lel OC es 
sO CCE Eells! ECL CsCl 
pesfol Tel fel fel-C FLEE 
Fsfol T f-[-] | f= [ete fel tt | TT +P TT TTT Tal 
9 i a 
pe Pee eek eae aaa 
Pswlof | tT Tt tel [ll fei | TE TT ET TT Te 
Stee let ioe i leah a) oe 
eee 
See Ree eee eee 
| Laatste adsl alse) Blasi ble l 

Oo] Aaben |Z zi] Zh} | ol Zi} Zlj ZijZi 





ASWAwRdA LNBAUAND 


COMMUNICATION THEORY 


stancy in structure. We might be able to identify the characteristic stamp 
of a particular musician. 


(3) We can try to compare structures of raga-s which are close to 
each other and to identify the region within which the chhaya of a 
neighbouring raga does not intrude. 


(4) Finally we may apply this technique to the study of ethnic music. 
For example we may try to see if fishermen from different parts of the 
country have a common music. 


(5) One can go on like this, but the essential point I want to make 
is this, that, if we can quantify structure or form or style in music, we can 
‘gain enormous insights into musical aesthetics. The rewards seem so large 
that we must attempt to do this. 


Win. ae 
*For suggestions regarding the usefulness of such techniques, cf. Deva. Psychoacoustics 
of Music and Speech, pp. 173 ff. 


